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YI.-CORRELATION IxN SEASONAL VARIATION OF CLImAtE. 


■”ip 

Gilbert 7. U'-a/ber, M.A., Sc.D., F.a.S. 

T 

INTRODUCTION. 

A cursorv e\'aminaUon of liie seasonal variations of any country mil shon' tliat 
■ome of the departures from average conditions are directly related to other 
ibnormnl features, and the method by whicli the effects arc produced is often fairly 
veil knwvn, as is the case when diminution of pressure is due to rise of temperature. 
D.her departures, however, are connected witlr abnormal features in distant parts 
)f tlie earth, and difficulties are experienced in ascertaining not only the nature of 
he results due to any variation, but also the chain of causes and effects by 
vnicii the results arc produced ; as examples of the latter may be quoted the favourable 
iifiuence upon Indian monsoon rainfall of the conditions which produce high temperature 
n the interior of Australia or liigh pressure in the Argentine Republic. Before attempting 
berefore to investigate the phenomena on physical lines it appears desirable to ascertain 
ly purely empiric.al methods the character of as many relationships as possible in the 
liopc of being able to pick out from the results so obtained a number of which the 
ihysical explanation is clear. If wc can in this way find the intermediate links in 
;he chain of causes we may replace an intricate pioblcm by a number of simpler 
ones. 

2. Ifmpiiic.al method.s may be divided into graphical and numerical, and of 
these the first are open to some objection. For although curves showing the changes 
of magnitude of two closely rci.rted’ quantities will make their connection obvious, 
■tuch'grapliical methods arc f.ar from sausfactory when the disturbing factors are 
fiumerous or the connection sought is slight. The same curves li.'ive been interpreted 
in opposite senses by different autiiors and in such cases numeriC'd methods like those 
of statistics seem inevitable; while in nearly all cases they appear desirable, inasmuch 
as they give quantitative instc.ad of qualitative results .and are free from subjective 
influences. Tiicy also afford a criterion of the reliability of a computed relationship 
by comparing it with the probable amount of fictitious relationship which we may 
e.xpect to be produced by mere accident even in entirely independent factors. 

.q. U must he remembered that the number of years for wliich reliable data 
are available over a large part of the earth does not exceed thirty, .and tliat some 
of the climatic elements with wliicii we .are concerned, cspcciallv rainfall and 
solar activity as measured by .sunspot numb -rs, probably undergo larger percentage 
variations, than do the quantities to which statistical methods are usually applied. 
The deviations from the exponential law of distribution may aKo bo appreciable; 
and ns the proof ordinarily given of the formula for tlic corrclnlion coefficient 
is, strictly speaking, only applic.ahie to cases in whicli the exponential law of 
distribution .applies, it appe.ars dcsir,'.ble to seek some justification which is .as free 
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as possible from hypotheses as to the frequency of occurrence of the variations , of 
the magnitudes concerned'. 

4. Let us consider two series of » quantities each, X,, X, X^ and K„ 

; let them be associated in pairs, X, with X, with Y, &c., and let their departures 
from their respective average ’ values be a;,, and y, y„ ...... jtv, 'so that 

Sa'=o, Sj>=o, where Sx stands for the sum of the n terms x„ x,, ...... and similarly 

Sj)'=^, +y, + A primitive way of ascertaining the extent to wliich the values 

of the terms of the one series are affected by those of the other is that of counting the 
number of times J> in which the values of a have the same sign as those of jr, and the 
number of times y in which the signs arc different. It is obvious that when the 
relationship is close the values of x Trill in most cases h^ve the same sign as those 
ofyifX, F tend to vary in the same direction, or the opposite sign if X, Y tend to 
vary in opposite directions, ff there is very slight relationship the variations will he 
nearly independent .and pjq will be nearly equal tounityn Thus the fraction (p—s) lip~hfj) 
of which the numerical value must lie between unity and zero, might be used to give 
a rough idea of relationship, 

5. When the number k in the series is small this method will be inaccurate because 
it docs not take into account the magnitude of the departures from normal. H, for 
example, five pairs of values of x and y be — 


X 

+ftj 

— <1*05 

—o'Ui 


— o't; 

y 

•f-trss 

4 o'o 4 

-0-45 

—0*13 



it will be seen that these indicate a direct relationship. For the values o’oi, 
o’02, o‘o3, 0*04 are so small as to afford no reliable indications, .such small qu.antities 
being completely m.Tsked by accidental causes : thus the values of x tend to he about 
twice those ofy. Tnere are also three signs alike and two unlike, .so th.at (p~’!/}/(P'{-f) 
is +i/5* ff however we consider a different case in which the same numbers occur 
in a different arrangement — 


X 

+ ri3 

“0*03 

— 0-94 

+ 0'03 


y 

— o’.?3 

“O'lS 

+<>■55 

+ 0-04 

--O'Ol 


it will be seen that the second, fourth and fifth pairs afford no reliable indications, and 
the first and third establish a relationship of the inverse char.actcr, X and Y tending 
to vary in opposite directions with values of x numerically about double those of y. 
The value of {p—q)j ipdrq) is still +1/5 and is misleading. Thus tvc cannot rely 
upon {f — g)! as giving even a rough indication ; and the reason of its failure is 

that it takes only the signs of the departures into account, not their magnitudes. 

' Certain authors (c.g. YuIo'On the theory of correlation * In the Journal of the Royal Sl.atinlical Socirt), Vol, 
LX, December iB97)obl.ain the correiation coefficient Tiithouteicplicit ns^implioiis .a', to the frequency of distributien j 
but they adopt the method of least rquarcs, and the justification of this depends on the law of drtrihution. 

’ The consideration of the effects inlrodutcd by using the ‘roodef instead of the' median 'value is dcfcrtcd for 

the present. 
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6. If the variations of A' and Y be not independent, it is natural to regard the depar- 
tures X of A' as made up of a portion governed hy y and a portion independent of y. If 
the values of y be small the portion of x determined by y may, if squares of small quan- 
tities be neglected, be taken as hy, where is a constant independent of ^ ; whenjV is not 
small the hypothesis that its effect is proportional to its magnitude appears the simplest 
which will approximately represent tiie facts. Thus we write 

x„—ky,yd, L 

r ■ ... (I) 

where d,, d„ . . , d„ .are remainders representing those portions of x which are dependent 
on factors other than^', and these may, when 9' alone is considered, be treated as if they 
were accidental. 


7. In order to determine h a conceivable plan would be to form two groups of 
equations, the first containing all those in which y is positive and the second those 
in which y is neg.ative. We should, on adding the equations of each group, obtain 
two equations of the form 

S,A‘=:iSi>’+S,£/ ■) 


where S„ S, indicate summations over equations containing positive and negative 
values of y. Since S-v—o, S,.«-{-S,.v=o; similarly S,9'-l-S,9'=o, and hence 
S,r/-{-S,rf=o. Also if tlie number of equations is large S,y and S,/ will be large, 
while as the d's are independent of the y’s, S,d will tend to contain as many nega- 
tive as positive terms and to be small by comparison with S,^ : thus we might, 
when the number of values is large, take h as given by either of the equivalent 
equations 


S,.v=.^ S, y) 

Sj A’ =: i Sj y 9 


(2) 


8. Such a process would however, when « is small, be open to an objection similar 
to that of para. 5 above. Thus if we consider, for simplicity, a case in which there are 
only four p.airs of departures. 


X 

+ i ’03 

—0-99 

— 0-98 

+0-94 

L- 

+ o' 5 i 

+ 0'01 

— 0'49 

— 0-03 


we note lh.at the second and fourtii p.airs afford no reliable indications owing to the 
small value of y, while the first and third show that the variations of X are about 
double those of Y. Further if we interchange the first and second values of x, and 
second and fourth we do not affect S,.v and S,x, or 8,9' and Sj^.- yet the numbers 
become 


X 

— o’99 

+ ro 3 

+ 0-94 

— o-gS 

y 

+ 0-51 

+ 0'0J 

—0*49 

— 0.03 


and the variations of X are now about minus twice those of Y, being in the opposite 
direction. The value of h as given by (2) is s/13 either case. 
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Till! incorrectness of these results obviously comrs from failing to lake mlb ac- 
count the smallness of -f-o’ot and — 0-03: or in other words attaching equal weight- 
to each of the four equations 

-i-co3=-ho-5i/ + rfi 
— 0-99= ^ o-oi/ -fA f 
— 0-98= — o'49t + n3 r 
+ 0-9,=— o-osX-t J 

It is obvious that the value of the indications given by those equations in which 
is small must be small to a proportionate amount, and hence that weight must be at- 
tached to the equations depending on the value of y; also the weight may be taken 
as proportional to that value, at any rate as a first approximation. 

Thus the equations (1) will, on being weighted, become 


; .Vb I/u^^ y’n* 4* y'n J ^ 

and on adding we have 

Sa;’ =XS y’ +Srfy ... ... (,) 

Now since by definition ibe d's are independent of the j’s there will when u is l.trge 
tend to be as large negative contributions to S// ;is positive conlrinutions, ,1r?d 
Srfy will tend to vanish by comparison with S}^. Thus in the limit we shall have 

S.v9'=.(S}'’ (,,) 

9. As the departures of A', Y arc negative as well as positive it is natural to 
define their mean values s, s, in terms of the squases of ,v, y by the equations 

«ri«=SA=, nrA=S;v' 

and it is convenient to introduce a new quantity r defined by the equ.ition 

tirsi ^j=Svy' ’ ( 5 ) 

Thus (4) becomes 

„ nrsiSj—lns,- 

and k^rSijsj ... ... ... ... ... ... ,,, ,,, ^5) 

10. Let us now ascertain the proportionate extent to which the variations of A’ are 

determined by the variations of V. Of the departures .r„ x, the amounts 

which are determined by Y are ky, , iy, , .... , and of these the mean may be 

denoted by w;s, where, by our definition of the mean, 

+ {iy„ ^ 

=/.2 S ;; b>' (6) 

Thus 

Now the mean value of the variaiions of A' is s , , and the mean v'alue of those portions 
which are dependent on-T is, .as we have seen, ms, or rs, . Thus r is the fraction of the 
variations of X which is determined by J'. Further from the definition (5) of r its value 
will not be affected if rve replace all the a;’s by the corresponding ^''s^ hence it is also 
equal to the fraction of the variations of Y which ai-e related to (hose of A". Thus it 
expresses the proportionate extent to which the variations of eacli are determined by, or 
related to, those of the other. It is usually called their correlation coefficient. 
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11. If the value of the ci's be derived from equations (i) we shall have 

Si^’=S(a- — kyy=n{s’— 2 krsi s\) 

and if we choose k so as to make Sd‘ a minimum we take rsi=ks„ as in (6). Thus the 
value of k previously obtained makes the sum of the squares of the residuals or accidental 
differences d a minimum. 

12. A similar method is applicable when we have three series of related quantities 

Xr, Xn, X„-, Y„ Yi, Y„\ and Z^, Z„. As before we shall designate the 

departures of X, Y, Z from their average values by x„ x„ y,, 

y ^ ; and s,, s, s„. Thus S.v=o, Sy=o, S£r=o -. we also define the mean values 

^7. Sj,, ^3 by the equations 

Msj' = Sy, Ms/=Ss’ and quantities r„ by the equations 

fir, s,Sj=Sy2, 7tr,s,s,=:Ssx, nr;S,s,=:Sxy (y). 

We then assume that the variations of x are determined by those of y and s by 
equations of the form 

Xi=ksyi+k,z^ + <ii 
X2’=k2yik- k^s.^'kdo L ^ 

.^n=^2yo+^S“n+'^» J 

where d„ d„ are independent of x,y and 2 ; and we are unable to apply our previous 

analysis without modification because y, z are not independent of each other. We can 
however determine a constant c such that z—cy shill be independent of^; the condition, 

as the previous work has shown, is that jy,(2,--cy,) + _y,(a2~cy..)4- cy-) 

• shall vanish, i.e., 

Syz — cSy2= o ... ... ... ... ... (9) 

whence 

nriS,Si— ties’— o 
or 

cjg~ j ... ... •* •** ftoj 

Mkiting now the equations (8) in the form 


xi = (^2 + cijlyi + 1— cyi) + rfi 
.* 2 =(^ 2 + f '^' 3 ) y 2 + ^ 3 (“ 2 -<^>’ 2 ) +‘^1 
• . ~ . < « • 

Xa= {k2 + cks)yn + — rjn) +r^li 

we note that all the terms {z—cy) and d are independent oty, and our former analysis is 
applicable. We can thus deduce the value of {k,-\-ck^) by multiplying the equations (ii) 

by yi, y„ .y„, omitting the r^’s and adding ; we obtain 

Sxy= (k, + ckj) 3y’ 

whence by (9) 

Sxy=k2Sy^+k2Sy3 ... ... — 



or 

rsSj = k2S2+kjriS2 ... ... ••• - - _ ^'3) 

It will be seen that the equation (12) is what would be obtained if we had multiplied 

our original equations (8) by y„ y„ yn, omitted the d's and added. In the 

same manner we may obtain the equation 

SA:^r= 1555^.^+^355’ ’ 

or , , 

rzCi=k,riSi+ksSi. ^ 



CORRELATION IN SEASONAL 


j 22 


From (13) and (14) we find 


X-j-Jifj-j-rsri)/ Jafi— J 


... (<S)' 


The proportionate extent to which the variations of X are governed by those of V 
and Z is the ratio to s, of the mean value of a, the proportionate extent ’" 

in where n tn^s,^~S{ksy-^kfS Y 


JS 


= II (iPsS’2+ 2 -fa ^ V.O 

On substitution and reduction we obtain 

(r?+ j^,—2r,r,rj )/ ( 1 

and the effective correlation coefRcient of x with y and a is 


|{f;+r;-2riejej)/(/-r;) 

13. The case of four variables may be treated similarly. If x, y, jr, w be representa- 
tive departures of four variables X, V, Z, \V from their average values, we shall define 
Si, Sj, Sj, St, r,s, r,j, r,„ r,t, r.,, r,„ by the equations 

S-v^=?rr„ Sf—nsl, Sa’=f;s®j, Sxy—n r„s,s, 

Sxs^tir,tS,Sj, Sxm—n r„s,S 4 , Sys=ti rnSaS,, Sy?v=xxcjStSt 

Ss2i=iirjjSjSt 

Then as before we assume equations of the form 

.Vi=fr, 3 ri.^«(jri+rti,«’, + o’, 

‘^2 — rTj3y3+ Ci-ff 24 “( 7 utrg 4 r/j 

.Vn — (JijI'b + <^[ 3^0 + <^1 (2* n + 



On multiplying by y„ omitting the d’s and adding we obtain 

"*^^35*34* 33^3 + /jj ... ... 

Similarly multiplying by a / , Cj r„ gives 

ns^i = rt|2rj3j3+ff,,j3+(;„r3,r^ ... 

and by xv„ w’„...,....,w„ 

''ii^i=='ri3r3(f3+ rri3r..ir3+CjtJ4 ... ... ... 


(iS) 

U9) 


The equations (17), (18) and (19) arc sufficient to give n,„ r?„ and /i,,. 

f 

Hence we find after some algebraic reduction 

'^12 = .^) .^^ 13(1 — 34) + '‘l2f'2t^3l“’M) + *l«(^23r3t“'si) ^ 

. 4 -Xj|^i-r^ 33 -).» 3 ,-r* 3 j+ 2 rj 3 r‘„rs, j 

with corresponding values for n,,, a,,, &c. 

Defining as before the correlation coefficient m of x with y, s, w, as the proportionate 
extent to which the variations x are determined by y, s, and a> we have, as before, 

''^*^“i=‘*V^2+'r’i3r’3+a’ur’.i+2'r»%»'KVa+affuai,r.,V4+2(?j.(Ti(»-3,rsr4 ... ■ (?o) 
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by (i?). (tS) and (19), 


Thus 


*1 ^J3l '“it 

i — * , 

— Si 

' «ll»'l5*i+fll3»'l3%+ 'Ini'll o> 

0, 

0 




1 

I'm 1 1 1 

l'23l 

^Sl 



'j*. »‘5l 


I'll ) I'ssi 

I, 




’"iti r'si, 1 


I'll ) I'll! 

I'sii 

1 


i 


On multiplying the 2nd, 3rd and 4th rows b)’— a,/j, — and adding to 
the first, this becomes, on using {17), (iS) and (19) 




—1 

■fi 

0 , 


. -I'll’! . 

“I'li^i 




I'll 1 

I 

I I'm » 

''si ■ 




I’m > 

I'm 

, 1 . 

I'll 




ni> 

I'll 

. I'll , 

I 

or 


- 

0 , 

I'll 

. I'm . 

''ll 




I'll 1 

1 

. I'm . 

I'll 




l'l 3 » 

I'm 

. I . 

''31 

Hence m is given by 


I'll. 

I'll 

. I'll 1 

I 


^ > I'm 1 

I'm 

~ i 

I . 

I'll . I'M 

. ''ll 


I'ji » t 1 

fu 

1 

1 

I'll > 

> . I'm 

, ''ll 


I'll » 1';, , 

1 


I'm. 

I'm . ‘ 

. ''ll 

14, Another theorem follow 

' t t ^st * * 

s from noticing that the eq 


(2t) 


equivalent to 
and hence 


S<fy’=o, S</r=o, 

Sd (/;u>-+<j,,5+<r„t(>)=so 
Sd (,v — cQ = o 


Srfwsso 


I C*f 

Thus S d^—S dx 

But rfj’ =.r)2— 2 r, + + + + + 

and hence summing the similar equations 


( 23 ) 

(23) 


Sd‘—Sx'' — 2S;r (x — d)+fjdS x^ 
where the last term follows from the definition of nt. 

Thus by (23) 

Sf^nrSar’—aSur’+aS rf*+w*3a:-* 
i-e. Sd^=(j—„d) Sx’ 

Thus the mean value of rf is s, (i — ?«’)*, ... (24) 
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and the more closely the correlation coefHcient ?« approaches unily the smaller is the 
mean value of d. 

15, It is natural after trying to explain the variations of A' in terms of tliose of Y, 
Z and IF in any climatic problem to evaluate the quantities 




+ 'Tj ,a'j j 
f 

^ lii’u ’!■ aj’if n + 'T I la’s J 


{^5) 


which are the values of x that would be inferred from known values of the y’s, Ys and w’s 
After evaluating these it is natural to work out their correlation coefficient lakh the 
actual values of x. The sum oftlic squares of the quantities {25) is, hy the definition of 
vt, equal to and hence Che corre/ation coefficient of the series (25) with the x's (s 


or 

Now in the numerator 
and, by {22) 


S[(aij}'4 /T,4 + <:i,rr)4ff3(«,,y-i + / wm-j, 

S + <7n ^ ^ IT,, r,/s= n w'iE 

, ' t 

S(ff77ijy'i-7?,jS4 (T„w)=0, 


Hence our fraction becomes nnrsi^-Ffunsi', or »;», the correlation coefficient of A \tith tlm 
variables, as defined by the proportionate extent to whicli its variatioiLS are governed by 
those of y, s, and w. 

16. Another consequence of (17), (i8) and (19) Is easily seen to be that the coeffi* * 
cients sf„ &c., are so chosen as to make Sd^ a minimum. The results of this and the 
three previous paragraphs are obviously applicable to any number of variables. 


Result oe assuming the rnponential law of distribution 

IN THE VARIATIONS. 

17. The values of the correlation coefficients liavc been obtained by making the 
simplest hypotheses, The equations (17), (iS) and (19) follow at once front assuming 
the exponential latv of distribution, and Pearson has shown that when lhatiaw holds the 
probable error of the coefficient r of correlation between two variables is •67,}f9(i — r‘) /«' 
where n is the number of pairs of values.* This result will frequently be utilised in, 
the succeeding papers on this subject as an approximation wlien the distribution law- of 
variation does not differ far from the exponential law. 


* I’bilc'^Dphic.-i! Tron'iclioi!', London, Vo!, jpt, p o.)3 (if jS). 
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